Avoid SeqCst or static mut in mach_timebase_info and QueryPerformanceFrequency caches #77727

thomcc · 2020-10-08T22:04:09Z

This patch went through a couple iterations but the end result is replacing a pattern where an AtomicUsize (updated with many SeqCst ops) guards a static mut with a single AtomicU64 that is known to use 0 as a value indicating that it is not initialized.

The code in both places exists to cache values used in the conversion of Instants to Durations on macOS, iOS, and Windows.

I have no numbers to prove that this improves performance (It seems a little futile to benchmark something like this), but it's much simpler, safer, and in practice we'd expect it to be faster everywhere where Relaxed operations on AtomicU64 are cheaper than SeqCst operations on AtomicUsize, which is a lot of places.

Anyway, it also removes a bunch of unsafe code and greatly simplifies the logic, so IMO that alone would be worth it unless it was a regression.

If you want to take a look at the assembly output though, see https://godbolt.org/z/rbr6vn for x86_64, https://godbolt.org/z/cqcbqv for aarch64 (Note that this just the output of the mac side, but i'd expect the windows part to be the same and don't feel like doing another godbolt for it). There are several versions of this function in the godbolt:

info_new: version in the current patch
info_less_new: version in initial PR
info_original: version currently in the tree
info_orig_but_better_orderings: a version that just tries to change the original code's orderings from SeqCst to the (probably) minimal orderings required for soundness/correctness.

The biggest concern I have here is if we can use AtomicU64, or if there are targets that dont have it that this code supports. AFAICT: no. (If that changes in the future, it's easy enough to do something different for them)

r? @Amanieu because he caught a couple issues last time I tried to do a patch reducing orderings 😅

I rewrote this whole message so the original is inside here

I happened to notice the code we use for caching the result of mach_timebase_info uses SeqCst exclusively.

However, thinking a little more, it's actually pretty easy to avoid the static mut by packing the timebase info into an AtomicU64.

This entirely avoids needing to do the compare_exchange. The AtomicU64 can be read/written using Relaxed ops, which on current macos/ios platforms (x86_64/aarch64) have no overhead compared to direct loads/stores. This simplifies the code and makes it a lot safer too.

I have no numbers to prove that this improves performance (It seems a little futile to benchmark something like this), although it should do that on both targets it applies to.

That said, it also removes a bunch of unsafe code and simplifies the logic (arguably at least — there are only two states now, initialized or not), so I think it's a net win even without concrete numbers.

If you want to take a look at the assembly output though, see below. It has the new version, the original, and a version of the original with lower Orderings (which is still worse than the version in this PR)

godbolt.org/z/obfqf9 x86_64-apple-darwin
godbolt.org/z/Wz5cWc aarch64-unknown-linux-gnu (godbolt can't do aarch64-apple-ios but that doesn't matter here)

A different (and more efficient) option than this would be to just use the AtomicU64 and use the knowledge that after initialization the denominator should be nonzero... That felt like it's relying on too many things I'm not confident in, so I didn't want to do that.

library/std/src/sys/unix/time.rs

thomcc · 2020-10-08T22:24:27Z

Ah, apparently the same pattern exists on windows too: https://github.com/rust-lang/rust/blob/master/library/std/src/sys/windows/time.rs#L199-L221

LARGE_INTEGER is 64 bits, I don't know if we can rely on AtomicU64 on all windows platforms (nor am I confident that relaxed AtomicU64 ops would be a net win on them where it is), but using weaker Orderings is still possible there.

m-ou-se · 2020-10-08T22:37:24Z

A different (and more efficient) option than this would be to just use the AtomicU64 and use the knowledge that after initialization the denominator should be nonzero... That felt like it's relying on too many things I'm not confident in, so I didn't want to do that.

The two places that call this info() function directly use numer (in one case) or denom (in the other case) as the right hand side of a division and remainder operator. So I think it's pretty safe to use 0 as a special 'empty' value for the AtomicU64. (If mach_timebase_info gives a zero, this division would panic right away.)

thomcc · 2020-10-09T00:13:03Z

So I think it's pretty safe to use 0 as a special 'empty' value for the AtomicU64. (If mach_timebase_info gives a zero, this division would panic right away.)

Yeah good point. I had also not thought about the fact that that this lets us use Relaxed ops.

The cached value on Windows is also nonzero too (it explicitly can't return 0 on any system >= XP, and the current code would panic if it did).

So I did this for both. Thanks @m-ou-se.

The remaining concern then I guess is for targets that run ios, macos, or windows, but don't support 64-bit atomics. Which according --print=target-list and --print=cfg these don't seem to be exist (and if they do, libstd probably doesn't work on them without other patches anyway).

library/std/src/sys/windows/time.rs

Amanieu · 2020-10-10T15:39:03Z

At the moment all platforms that mac and windows support will support AtomicU64, so I think it's fine to depend on it.

@bors r+

bors · 2020-10-10T15:39:04Z

📌 Commit 4f37220 has been approved by Amanieu

bors · 2020-10-11T03:34:00Z

⌛ Testing commit 4f37220 with merge 4a89ee5cab4196d6de65d43896fa19a6fce1a4b6...

bors · 2020-10-11T03:54:19Z

💔 Test failed - checks-actions

m-ou-se · 2020-10-11T13:27:45Z

💔 Test failed - checks-actions

This was a spurious error:

warning: spurious network error (2 tries remaining): failed to get 200 response from `https://crates.io/api/v1/crates/crossbeam-utils/0.6.6/download`, got 503
warning: spurious network error (1 tries remaining): failed to get 200 response from `https://crates.io/api/v1/crates/crossbeam-utils/0.6.6/download`, got 503
error: failed to download from `https://crates.io/api/v1/crates/crossbeam-utils/0.6.6/download`

Caused by:
  failed to get 200 response from `https://crates.io/api/v1/crates/crossbeam-utils/0.6.6/download`, got 503

Amanieu · 2020-10-11T14:05:59Z

@bors retry

bors · 2020-10-11T14:06:12Z

⌛ Testing commit 4f37220 with merge bc74dd7...

bors · 2020-10-11T15:56:53Z

☀️ Test successful - checks-actions, checks-azure
Approved by: Amanieu
Pushing bc74dd7 to master...

Avoid SeqCst or static mut in mach_timebase_info cache

f30cc74

rust-highfive assigned Amanieu Oct 8, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 8, 2020

thomcc commented Oct 8, 2020

View reviewed changes

library/std/src/sys/unix/time.rs Outdated Show resolved Hide resolved

Fiddle with the comments

e4cf24b

camelid added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Oct 8, 2020

thomcc added 2 commits October 8, 2020 17:03

Switch to using a single atomic and treating 0 as 'uninitialized'

59c06e9

Implement the same optimization in windows/time

4f37220

thomcc changed the title ~~Avoid SeqCst or static mut in mach_timebase_info cache~~ Avoid SeqCst or static mut in mach_timebase_info and QueryPerformanceFrequency caches Oct 9, 2020

jyn514 added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Oct 9, 2020

ollie27 reviewed Oct 9, 2020

View reviewed changes

library/std/src/sys/windows/time.rs Show resolved Hide resolved

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 10, 2020

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Oct 11, 2020

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 11, 2020

bors added the merged-by-bors This PR was explicitly merged by bors. label Oct 11, 2020

bors merged commit bc74dd7 into rust-lang:master Oct 11, 2020

rustbot added this to the 1.49.0 milestone Oct 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid SeqCst or static mut in mach_timebase_info and QueryPerformanceFrequency caches #77727

Avoid SeqCst or static mut in mach_timebase_info and QueryPerformanceFrequency caches #77727

thomcc commented Oct 8, 2020 •

edited

Loading

thomcc commented Oct 8, 2020

m-ou-se commented Oct 8, 2020 •

edited

Loading

thomcc commented Oct 9, 2020

Amanieu commented Oct 10, 2020

bors commented Oct 10, 2020

bors commented Oct 11, 2020

bors commented Oct 11, 2020

m-ou-se commented Oct 11, 2020

Amanieu commented Oct 11, 2020

bors commented Oct 11, 2020

bors commented Oct 11, 2020

Avoid SeqCst or static mut in mach_timebase_info and QueryPerformanceFrequency caches #77727

Avoid SeqCst or static mut in mach_timebase_info and QueryPerformanceFrequency caches #77727

Conversation

thomcc commented Oct 8, 2020 • edited Loading

thomcc commented Oct 8, 2020

m-ou-se commented Oct 8, 2020 • edited Loading

thomcc commented Oct 9, 2020

Amanieu commented Oct 10, 2020

bors commented Oct 10, 2020

bors commented Oct 11, 2020

bors commented Oct 11, 2020

m-ou-se commented Oct 11, 2020

Amanieu commented Oct 11, 2020

bors commented Oct 11, 2020

bors commented Oct 11, 2020

thomcc commented Oct 8, 2020 •

edited

Loading

m-ou-se commented Oct 8, 2020 •

edited

Loading